Search CORE

102 research outputs found

A General Framework for Gene Tree Correction Based on Duplication-Loss Reconciliation

Author: El-Mabrouk Nadia
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 17th International Workshop on Algorithms in Bioinformatics (WABI 2017)
Publication date: 01/01/2017
Field of study

Due to the key role played by gene trees and species phylogenies in biological studies, it is essential to have as much confidence as possible on the available trees. As phylogenetic tools are error-prone, it is a common task to use a correction method for improving an initial tree. Various correction methods exist. In this paper we focus on those based on the Duplication-Loss reconciliation model. The polytomy resolution approach consists in contracting weakly supported branches and then refining the obtained non-binary tree in a way minimizing a reconciliation distance with the given species tree. On the other hand, the supertree approach takes as input a set of separated subtrees, either obtained for separared orthology groups or by removing the upper branches of an initial tree to a certain level, and amalgamating them in an optimal way preserving the topology of the initial trees. The two classes of problems have always been considered as two separate fields, based on apparently different models. In this paper we give a unifying view showing that these two classes of problems are in fact special cases of a more general problem that we call LabelGTC, whose input includes a 0-1 edge-labelled gene tree to be corrected. Considering a tree as a set of triplets, we also formulate the TripletGTC Problem whose input includes a set of gene triplets that should be preserved in the corrected tree. These two general models allow to unify, understand and compare the principles of the duplication-loss reconciliation-based tree correction approaches. We show that LabelGTC is a special case of TripletGTC. We then develop appropriate algorithms allowing to handle these two general correction problems

Dagstuhl Research Online Publication Server

Boyer-Moore strategy to efficient approximate string matching

Author: Crochemore Maxime
El Mabrouk Nadia
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1996
Field of study

International audienceWe propose a simple but e cient algorithm for searching all occurrences of a pattern or a class of patterns (length m) in a text (length n) with at most k mismatches. This algorithm relies on the Shift-Add algorithm of Baeza-Yates and Gonnet [6], which involves representing by a bit number the current state of the search and uses the ability of programming languages to handle bit words. State representation should not, therefore, exceeds the word size w, that is, m(⌈log2(k+1)⌉+1 )≤w. This algorithm consists in a preprocessing step and a searching step. It is linear and performs 3n operations during the searching step. Notions of shift and character skip found in the Boyer-Moore (BM) [9] approach, are introduced in this algorithm. Provided that the considered alphabet is large enough (compared to the Pattern length), the average number of operations performed by our algorithm during the searching step becomes n(2+(k+4)/(m-k))

CiteSeerX

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Non-Binary Tree Reconciliation with Endosymbiotic Gene Transfer

Author: El-Mabrouk Nadia
Gascon Mathieu
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)
Publication date: 01/01/2022
Field of study

Gene transfer between the mitochondrial and nuclear genome of the same species, called endosymbiotic gene transfer (EGT), is a mechanism which has largely shaped gene contents in eukaryotes since a unique ancestral endosymbiotic event know to be at the origin of all mitochondria. The gene tree-species tree reconciliation model has been recently extended to account for EGTs: given a binary gene tree and a binary species tree, the EndoRex software outputs an optimal DLE-Reconciliation, that is an embedding of the gene tree into the species tree inducing a most parsimonious history of Duplications, Losses and EGT events. Here, we provide the first algorithmic study for DLE-Reconciliation in the case of a multifurcated (non-binary) gene tree. We present a general two-steps method: first, ignoring the mitochondrial-nuclear (or 0-1) labeling of leaves, output a binary resolution minimizing the DL-Reconciliation and, for each resolution, assign a known number of 0s and 1s to the leaves in a way minimizing EGT events. While Step 1 corresponds to the well studied non-binary DL-Reconciliation problem, the complexity of the formal label assignment problem related to Step 2 is unknown. Here, we show it is NP-complete even for a single polytomy (non-binary node). We then provide a heuristic which is exact for the unitary cost of operations, and a polynomial-time algorithm for solving a polytomy in the special case where genes are specific to a single genome (mitochondrial or nuclear) in all but one species

Dagstuhl Research Online Publication Server

Efficient Non-Binary Gene Tree Resolution with Weighted Reconciliation Cost

Author: El-Mabrouk Nadia
Lafond Manuel
Noutahi Emmanuel
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)
Publication date: 01/01/2016
Field of study

Polytomies in gene trees are multifurcated nodes corresponding to unresolved parts of the tree, usually due to insufficient differentiation between sequences of homologous gene copies. Apart from gene sequences, other information such as that contained in the species tree can be used to resolve such intricate parts of a gene tree. The problem of resolving a multifurcated tree has been considered by many authors, the objective function often being the number of duplications and losses reflected by the reconciliation of the resolved gene tree with the species tree. Here, we present PolytomySolver, an algorithm accounting for a more general model allowing different costs for duplications and losses per species. The time complexity of this algorithm is linear for the unit cost and is quadratic for the general cost, which outperforms the best known solutions so far by a linear factor. We show on simulated trees that the gain in theoretical complexity has a real practical impact on running times

Dagstuhl Research Online Publication Server

The complexity of comparing multiply-labelled trees by extending phylogenetic-tree metrics

Author: El-Mabrouk Nadia
Huber Katharina T
Lafond Manuel
Moulton Vincent
Publication venue: 'Elsevier BV'
Publication date: 15/03/2018
Field of study

A multilabeled tree (or MUL-tree) is a rooted tree in which every leaf is labelled by an element from some set, but in which more than one leaf may be labelled by the same element of that set. In phylogenetics, such trees are used in biogeographical studies, to study the evolution of gene families, and also within approaches to construct phylogenetic networks. A multilabelled tree in which no leaf-labels are repeated is called a phylogenetic tree, and one in which every label is the same is also known as a tree-shape. In this paper, we consider the complexity of computing metrics on MUL-trees that are obtained by extending metrics on phylogenetic trees. In particular, by restricting our attention to tree shapes, we show that computing the metric extension on MUL-trees is NP-complete for two well-known metrics on phylogenetic trees, namely, the path-difference and Robinson Foulds distances. We also show that the extension of the Robinson Foulds distance is fixed parameter tractable with respect to the distance parameter. The path distance complexity result allows us to also answer an open problem concerning the complexity of solving the quadratic assignment problem for two matrices that are a Robinson similarity and a Robinson dissimilarity, which we show to be NP-complete. We conclude by considering the maximum agreement subtree (MAST) distance on phylogenetic trees to MUL-trees. Although its extension to MUL-trees can be computed in polynomial time, we show that computing its natural generalization to more than two MUL-trees is NP-complete, although fixed-parameter tractable in the maximum degree when the number of given trees is bounded

arXiv.org e-Print Archive

University of East Anglia digital repository

Gene Maps Linearization Using Genomic Rearrangement Distances

Author: Bowers J.
Danny Hermelin
El-Mabrouk N.
Eric Blais
Guillaume Blin
Jackson B.
Mathieu Blanchette
Nadia El-Mabrouk
Pierre Guillon
Sankoff D.
Yap I.
Publication venue: 'Mary Ann Liebert Inc'
Publication date
Field of study

Crossref

Reconstructing the History of Syntenies Through Super-Reconciliation

Author: Delabre Matteo
El-Mabrouk Nadia
Huber Katharina
Lafond Manuel
Moulton Vincent
Noutahi Emmanuel
Sautie Castellanos Miguel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/09/2018
Field of study

Classical gene and species tree reconciliation, used to infer the history of gene gain and loss explaining the evolution of gene families, assumes an independent evolution for each family. While this assumption is reasonable for genes that are far apart in the genome, it is clearly not suited for genes grouped in syntenic blocks, which are more plausibly the result of a concerted evolution. Here, we introduce the Super-Reconciliation model, that extends the traditional Duplication-Loss model to the reconciliation of a set of trees, accounting for segmental duplications and losses. From a complexity point of view, we show that the associated decision problem is NP-hard. We then give an exact exponential-time algorithm for this problem, assess its time efficiency on simulated datasets, and give a proof of concept on the opioid receptor genes

University of East Anglia digital repository

Gene tree correction guided by orthology

Author: El-Mabrouk Nadia
Lafond Manuel
Semeria Magali
Swenson Krister,
Tannier Eric
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

International audienceBackgroundReconciled gene trees yield orthology and paralogy relationships between genes. This information may however contradict other information on orthology and paralogy provided by other footprints of evolution, such as conserved synteny.ResultsWe explore a way to include external information on orthology in the process of gene tree construction. Given an initial gene tree and a set of orthology constraints on pairs of genes or on clades, we give polynomial-time algorithms for producing a modified gene tree satisfying the set of constraints, that is as close as possible to the original one according to the Robinson-Foulds distance. We assess the validity of the modifications we propose by computing the likelihood ratio between initial and modified trees according to sequence alignments on Ensembl trees, showing that often the two trees are statistically equivalent.AvailabilitySoftware and data available upon request to the corresponding author

Springer - Publisher Connector

INRIA a CCSD electronic archive server

PubMed Central

Gene Family Evolution by Duplication, Speciation, and Loss

Author: Berglung A.-C.
Cedric Chauve
Jean-Philippe Doyon
Nadia El-Mabrouk
Vernot B.
Publication venue: 'Mary Ann Liebert Inc'
Publication date
Field of study

Crossref